Reduction Techniques for Graph-Based Convex Clustering
نویسندگان
چکیده
The Graph-based Convex Clustering (GCC) method has gained increasing attention recently. The GCC method adopts a fused regularizer to learn the cluster centers and obtains a geometric clusterpath by varying the regularization parameter. One major limitation is that solving the GCC model is computationally expensive. In this paper, we develop efficient graph reduction techniques for the GCC model to eliminate edges, each of which corresponds to two data points from the same cluster, without solving the optimization problem in the GCC method, leading to improved computational efficiency. Specifically, two reduction techniques are proposed according to tree-based and cyclic-graph-based convex clustering methods separately. The proposed reduction techniques are appealing since they only need to scan the data once with negligibly additional cost and they are independent of solvers for the GCC method, making them capable of improving the efficiency of any existing solver. Experiments on both synthetic and real-world datasets show that our methods can largely improve the efficiency of the GCC model.
منابع مشابه
A partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملModified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers
Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering in which there is no need to be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملDistributed Graph Clustering and Sparsification
Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. Most of these methods, however, are based on complicated spectral techniques or convex optimisation, and cannot be d...
متن کاملModelling Decision Problems Via Birkhoff Polyhedra
A compact formulation of the set of tours neither in a graph nor its complement is presented and illustrates a general methodology proposed for constructing polyhedral models of decision problems based upon permutations, projection and lifting techniques. Directed Hamilton tours on n vertex graphs are interpreted as (n-1)- permutations. Sets of extrema of Birkhoff polyhedra are mapped to tours ...
متن کامل